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ABSTRACT 

We  introduce  a  new  algorithm  to  identify  multiple 

target  concepts  when  data  are  represented  by  multiple  instances. 

A  multiple  instance  data  sample  is  characterized  by  a  bag  that 
contains  multiple  feature  vectors,  or  instances.  Each  bag  is  labeled 
as  either  positive  or  negative.  However,  the  labels  of  the  instances 
within  each  bag  are  unknown.  A  bag  is  labeled  as  positive  if  and 
only  if  at  least  one  of  its  instances  is  positive  and  negative  if 
and  only  if  all  of  its  instances  are  negative.  First,  we  define 
a  fuzzy  Multi-target  concept  Diverse  Density  (MDD)  metric. 

The  MDD  is  maximized  when  the  target  concepts  correspond 
to  dense  regions  in  the  feature  space  with  maximal  correlation 
to  instances  from  positive  samples,  and  minimal  correlation  to 
instances  from  negative  samples.  Then,  we  develop  an  iterative 
algorithm  to  optimize  the  MDD  and  identify  K  target  concepts 
simultaneously.  The  proposed  algorithm,  called  Fuzzy  Clustering 
of  Multiple  Instance  data  (FCMI),  is  tested  and  validated  by  using 
it  to  analyze  data  of  buried  landmines  collected  using  a  ground 
penetrating  radar  sensor.  We  show  that  the  FCMI  algorithm 
can  identify  distinct  target  concepts  that  correspond  to  mines 
of  different  types  buried  at  different  depths.  We  also  show  that 
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Abstract — We  introduce  a  new  algorithm  to  identify  multiple 
target  concepts  when  data  are  represented  by  multiple  instances. 
A  multiple  instance  data  sample  is  characterized  by  a  bag  that 
contains  multiple  feature  vectors,  or  instances .  Each  bag  is  labeled 
as  either  positive  or  negative.  However,  the  labels  of  the  instances 
within  each  bag  are  unknown.  A  bag  is  labeled  as  positive  if  and 
only  if  at  least  one  of  its  instances  is  positive  and  negative  if 
and  only  if  all  of  its  instances  are  negative.  First,  we  define 
a  fuzzy  Multi-target  concept  Diverse  Density  (MDD)  metric. 
The  MDD  is  maximized  when  the  target  concepts  correspond 
to  dense  regions  in  the  feature  space  with  maximal  correlation 
to  instances  from  positive  samples,  and  minimal  correlation  to 
instances  from  negative  samples.  Then,  we  develop  an  iterative 
algorithm  to  optimize  the  MDD  and  identify  K  target  concepts 
simultaneously.  The  proposed  algorithm,  called  Fuzzy  Clustering 
of  Multiple  Instance  data  (FCMI),  is  tested  and  validated  by  using 
it  to  analyze  data  of  buried  landmines  collected  using  a  ground 
penetrating  radar  sensor.  We  show  that  the  FCMI  algorithm 
can  identify  distinct  target  concepts  that  correspond  to  mines 
of  different  types  buried  at  different  depths.  We  also  show  that 
FCMI  can  be  used  to  label  individual  instances  within  each  bag. 

I.  Introduction 

Standard  machine  learning  problems  characterize  an  indi¬ 
vidual  data  sample  by  a  single  representative  feature  vector. 
For  many  applications,  such  as  drug  activity  prediction  [1]  and 
landmine  detection  [2],  each  individual  data  sample  may  be 
represented  by  multiple  features,  each  of  which  has  ambiguous 
label.  Dietterich  et  al.  [1]  proposed  the  Multiple  Instance 
Learning  (MIL)  framework  for  identifying  and  modeling  such 
problems.  Under  this  framework,  each  data  sample  is  repre¬ 
sented  by  one  class-labeled  “bag,”  that  contains  an  arbitrary 
number  of  unlabeled  “instances”,  each  of  which  is  a  single 
feature  vector  in  the  feature  space.  The  machine  learning  task 
within  the  MIL  framework  consists  of  identifying  bags,  along 
with  their  subset  of  instances,  that  can  be  used  to  learn  a 
classifier  to  label  new  bags. 

To  illustrate  the  need  for  MIL,  we  consider  the  application 
of  landmine  detection  using  ground  penetrating  radar  (GPR). 
The  GPR  sensor  is  mounted  on  a  vehicle  and  collects  3 -dim 
data  as  the  vehicle  moves.  The  first  2  dimensions  (down-track 
and  cross-track)  refer  to  the  spatial  location  on  the  ground 
while  the  3rd  dimension  refers  to  the  depth.  Typically,  in 
labeled  training  data,  the  spatial  location  is  known,  but  the 
depth  is  not.  To  illustrate  this  data,  in  figure  1  we  display 
the  GPR  signatures  of  the  same  mine  buried  at  3  in  deep 
in  two  geographically  different  sites.  We  only  show  a  2-D 
view  (down-track,  depth)  of  the  alarms.  First,  we  note  that  the 
actual  target  signature  does  not  extend  over  all  depth  values. 
Thus,  extracting  one  global  feature  vector  from  the  alarm 
may  not  discriminate  between  mines  and  clutter  effectively.  To 


overcome  this  limitation,  multiple  features  should  be  extracted 
from  small  windows  at  different  depths  [3],  [4].  For  instance, 
in  figure  1  we  show  8  windows  (typically,  more  overlapping 
windows  are  used).  The  main  challenge  in  developing  a 
classifier  for  this  application  is  the  selection  of  the  appropriate 
depth  for  training.  For  instance,  knowing  the  burial  depth  (3in) 
in  figure  1  is  not  sufficient  to  identify  the  best  window  for 
training.  In  addition  to  soil  properties,  the  true  signature  depth 
depends  on  other  factors  such  as  mine  type  and  environmental 
conditions.  In  figure  2  we  display  the  GPR  signature  of  a  large 
mine  and  a  small  mine.  As  it  can  be  seen,  for  the  large  target, 
the  signature  can  extend  over  3  or  4  consecutive  windows  while 
the  signature  of  a  small  window  does  not  extend  beyond  one 
window.  In  Section  (IV),  we  will  show  that  using  an  MIL 
approach,  each  alarm  would  be  represented  by  a  bag  of  features 
extracted  from  multiple  depths.  Within  each  bag,  some  features 
would  correspond  to  the  mine  signature  while  other  features 
would  correspond  to  background.  The  label  of  each  instance 
is  not  known. 

Other  applications  where  the  MIL  framework  has  proved 
to  be  effective  include  automated  image  annotation  [5],  text 
document  classification  [6],  speaker  identification  [7],  and 
many  others  [8] 
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(a)Target  at  site  1  (b)  Target  at  site  2 

Fig.  1.  Depth  of  target  signature  depends  on  the  soil  properties  of  the  site. 
The  same  mine  type  is  buried  at  3in  deep  in  both  sites. 

Since  its  formal  introduction,  MIL  research  has  focused  on 
supervised  learning.  Existing  methods  typically  rely  on  two 
main  approaches  [8].  The  first  one  concatenates  features  from 
all  instances  of  a  bag  into  one  feature  vector  and  utilizes  highly 
sparse  learning  algorithms  to  learn  relevant  features/instances 
[9],  [1],  [10].  In  the  second  approach,  first  a  collection  of 
prototypes  is  identified.  Then,  using  these  prototypes,  each  bag 
is  mapped  to  a  point  in  a  new  feature  space  and  conventional 
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and  identify  the  target  concept.  Another  research  direction 
has  used  the  DD  metric  to  analyze  the  relationship  between 
regions  of  the  feature  space  and  bags  (collectively  or  indi¬ 
vidually)  and  identify  multiple  target  concepts.  The  multiple 
concepts  are  needed  to  capture  the  within-class  variations. 
The  learned  concepts  are  then  used  to  perform  feature  space 
mapping  (similar  to  kernal  space  transformation)  to  convert 
the  multiple  instance  features  to  single- vector  features,  upon 
which  conventional  learning  methods  can  be  applied.  Examples 
of  such  methods  include  DD-SVM  [11]  and  MILES  [12]. 
The  above  approaches  learn  the  multiple  concepts  sequentially. 
First,  they  repetitively  optimize  the  single  concept  DD  metric 
using  different  initialization.  Then,  a  validity  measure  is  used 
to  identify  meaningful  and  diverse  target  concepts. 


Fig.  2.  Size  of  target  signature  depends  on  the  target  type 

classification  algorithms  are  used  [11],  [12],  [13]. 

In  this  paper,  we  focus  on  unsupervised  learning  for 
multiple  instance  data.  Our  approach,  called  Fuzzy  Clustering 
of  Multiple  Instance  data  (FCMI),  strives  to  identify  dense  re¬ 
gions  in  the  feature  space  with  maximal  correlation  to  instances 
from  positive  samples,  and  minimal  correlation  to  instances 
from  negative  samples.  The  proposed  FCMI  algorithm  uses  a 
fuzzy  clustering  approach  [14]  to  extend  the  Diverse  Density 
model  [9]  to  identify  multiple  target  concepts  simultaneously. 

The  organization  of  the  rest  of  this  paper  is  as  follows. 
In  Section  II,  we  review  related  work  and  highlight  the  need 
for  our  approach.  In  Section  III,  we  introduce  the  objective 
function  of  the  FCMI  and  derive  the  necessary  conditions  to 
optimize  it.  In  Section  IV,  we  report  experimental  results  and 
we  conclude  in  Section  V. 


Multiple  target  concept  learning  in  multiple  instance  data 
can  be  viewed  as  a  clustering  problem.  Within  the  clustering 
community,  it  is  well-known  that  extracting  one  cluster  at  a 
time  is  not  effective.  In  fact,  using  this  approach,  only  points 
within  the  cluster  of  interest  will  be  considered  inliers.  Points 
in  other  clusters  will  be  treated  as  outliers.  Thus,  when  the 
expected  number  of  clusters  is  larger  than  two,  even  very 
robust  algorithms  will  break  down.  A  more  common  practice  is 
to  define  and  optimize  an  objective  function  that  seeks  multiple 
clusters  simultaneously.  The  K-Means  [17]  and  the  EM  [18] 
algorithms  fall  into  this  category.  Moreover,  fuzzy  objective 
functions  [14],  [19],  [20]  that  allow  data  samples  to  belong  to 
multiple  clusters  with  various  membership  degrees  has  proved 
to  be  more  reliable. 

In  the  following,  we  use  fuzzy  clustering  concepts  to  define 
a  Multi-target  concept  Diverse  Density  (MDD)  metric.  We 
show  that  multiple  concepts  can  be  identified  simultaneously 
by  optimizing  the  proposed  MDD  metric. 


II.  Related  Work 

Initial  contemplation  of  the  need  to  represent  data  samples 
with  more  than  a  single  feature  vector  can  be  traced  back  to  (at 
the  latest)  two  major  applications:  the  need  to  predict  bonding 
activity  in  drug  design  [15],  and  the  problem  of  handwritten 
digit  recognition  [16].  To  the  best  of  our  knowledge,  Diet- 
terich,  et.  al  [1]  were  the  first  to  formalize  the  definition  and 
requirements  of  the  traditional  bag-instance  Multiple  Instance 
Learning  framework.  As  a  solution,  they  proposed  a  simple 
algorithm,  called  the  Axis-Parallel  Rectangles  (APR).  The 
APR  constructs  a  set  of  boundaries  in  the  problem  feature 
space  that  enclose  at  least  one  instance  from  every  positive 
sample  in  a  training  dataset,  while  excluding  as  many  instances 
from  negative  data  samples  as  is  possible. 

The  next  major  step  in  MIL  research  was  the  formulation 
of  the  Diverse  Density  (DD)  approach  [9].  In  [9],  the  author 
defines  the  Diverse  Density  metric  which  combines  the  cu¬ 
mulative  probability  that  the  positive  bags  are  correlated  with 
a  given  point  of  interest,  and  the  cumulative  probability  that 
negative  bags  are  not  correlated  with  it.  The  DD  algorithm 
seeks  to  identify  the  point  of  interest  that  maximizes  the 
DD  metric.  This  point  is  called  target  concept.  The  DD 
algorithm  spurred  several  direct  variations  designed  to  im¬ 
prove  performance  or  convergence  efficiency.  For  example,  the 
EM-DD  algorithm  [10]  is  a  variation  where  an  Expectation- 
Maximization  algorithm  is  used  to  optimize  the  DD  metric 


III.  Fuzzy  Clustering  of  Multiple  Instance  Data 

Let  B  =  {L>i,  •  •  •  ,  Bn,  •  •  •  ,  Bn}  represent  the  set  of  data 
samples.  Each  bag,  Bn  =  { bnl ,  •  •  •  ,  bnU  •  •  •  ,  bnI },  has  I  in¬ 
stances1  and  each  instance,  bni  =  {bni i,  •  •  •  ,  bnif,  •  •  •  , 
is  an  F-dimensional  feature  vector.  In  MIL,  a  bag  is  labeled 
as  positive  (class  of  interest),  B+ ,  if  and  only  if  at  least  one 
of  its  instances  is  positive.  Similarly,  a  bag  is  labeled  negative, 
B~,  if  and  only  if  all  of  its  instances  are  negative.  We  assume 
that  our  data  has  Npos  positive  bags  and  Nneg  negative  bags 
such  that  Npos+Nneg=N.  Let  B+  =  {B+ ,  •  •  •  ,  B^pog}  and 
B~  =  {B^[ ,  •  •  •  ,  }  denote  the  subsets  of  positive  and 

negative  bags  respectively. 

In  MIL,  each  object  is  represented  by  multiple  instances 
and  no  information  about  the  relevance  of  each  feature  is 
unknown.  Typically,  only  one  or  a  few  instances  are  relevant. 
Thus,  this  type  of  data  has  an  additional  ambiguity  dimension 
making  it  more  appropriate  to  analyze  with  a  fuzzy  approach 
as  illustrated  in  figure  3.  In  this  figure,  we  assume  that  the 
data  have  two  true  target  concepts  with  centers  marked  as  TC\ 
and  TC2.  We  display  two  bags  that  can  belong  to  either  target 
concept.  The  first  bag,  B\  has  five  instances  {a,  6,  c,  d,  ej  and 
one  of  its  instances,  a,  is  equally  close  to  TC\  and  TC2 . 
This  is  the  same  scenario  encountered  in  clustering  traditional 
data.  Another  scenario,  that  is  unique  to  MIL  data,  and  that 

1  It  is  not  required  that  all  bags  have  the  same  number  of  instances.  Here, 
we  assume  it  is  the  case  only  to  simplify  notation. 


requires  fuzzy  assignment  is  illustrated  with  a  second  bag, 
B2  =  {A  L>,  C,  D ,  L?}.  In  this  case,  one  instance,  A,  is  close 
to  TCi  while  a  different  instance,  B  of  the  same  bag  is  close 
to  TC2 .  In  other  words,  the  features  that  make  B2  similar  to 
one  target  concept  are  different  from  the  features  that  make  the 
same  bag  similar  to  a  different  target  concept.  The  proposed 
Fuzzy  Clustering  of  Multiple  Instance  Data  (FCMI)  algorithm 
is  designed  to  seek  multiple  target  concepts  simultaneously 
using  fuzzy  membership  assignment  of  bags  to  all  target 
concepts  to  address  both  of  the  above  scenarios. 


Fig.  3.  Two  cases  that  require  fuzzy  assignment  of  a  bag  to  multiple 
target  concepts.  The  first  bag,  B\  =  { a ,  b,  c,  d,  e}  has  one  instance,  a, 
that  is  close  to  both  target  concepts  TC\  and  TC2.  The  second  bag 
B2  =  { A ,  B ,  C ,  D ,  E}  has  one  instance,  A,  that  is  close  to  TC 1  and  another 
instance,  B,  that  is  close  to  TC2. 


The  objective  of  the  FCMI  algorithm  is  to  identify  K  target 
concepts  T  =  {U,  •  •  •  ,  •  •  •  ,  t^},  that  describe  regions  in 

the  feature  space  that  include  as  many  positive  instances  as 
possible  and  as  few  negative  instances  as  possible2.  Using  a 
fuzzy  approach,  we  assume  that  each  bag,  Bn ,  belongs  to  each 
target  concept  tk  with  a  membership  Ukn  such  that: 

K 

and  '^2,ukn  =  1.  (1) 

k= 1 

Let  U  =[ukn\  for  k  =  1,  •  •  •  ,  T  and  n  =  1,  •  •  •  ,7V.  We  define 
the  fuzzy  Multi-target  concept  Diverse  Density  (MDD)  metric 
as 

N  K 

MDD(T,  U)  =  nn  (Pr(tk\Bn))u^.  (2) 

n= 1 k= 1 

In  (2),  m  is  a  fuzzifier  that  controls  the  fuzziness  of  the 
partition  as  in  the  FCM  [14].  The  MDD  in  (2)  is  maximized 
when  the  T  target  concepts  correspond  to  points  in  the  in¬ 
stances  feature  space  such  that  each  target  is  close  to  as  many 
instances  from  positive  bags  as  possible  and  far  from  as  many 
instances  from  negative  bags  as  possible  (refer  to  (10)  for  the 

2 Recall  that  we  only  know  if  a  bag  is  positive  or  negative.  Labels  at  the 
instance  level  are  not  available. 


definition  of  Pr(tk\Bn)).  The  proposed  FCMI  algorithm  seeks 
the  optimal  (T,  U)  that  maximize  the  MDD  in  (2). 

Instead  of  maximizing  (2),  we  minimize  its  negative  log- 
likelihhood: 

J(T,U)  =  -log(MDD(  T,U)) 

N  K 

=  J2Y,ukn{-log(Pr(tk\Bn))}  (3) 

n=  1  k=l 

subject  to  the  membership  constraints  in  (1). 


To  minimize  (3)  with  respect  to  U,  we  apply  Lagrange 
multipliers  and  obtain 

N  K 

J(T,U,A)  =  J2Y,ukn{-log(Pr(tk\Bn))} 

n= 1 k= 1 

N  K 

(4) 

n= 1  k= 1 

Assuming  that  the  partial  densities  (Pr(tk\Bn)),  n=l,  •  •  •  ,  TV 
and  the  columns  of  U  are  independent  of  each  other,  we 
can  reduce  (4)  to  the  following  TV  independent  minimization 
problems: 


K 

Jn(  T,Un,An)  =  Y,<n{-log{Pr{tk\Bn))} 

k=  1 

K 

—  K  ( y]  Ukn  -  l) ,  n  =  1,  ■  •  ■  ,  N.  (5) 

k= 1 


Next,  we  fix  T  and  set  the  gradient  of  Jn  to  zero,  we  obtain 
d  J 

- =  mu™-Hog{Pr(tq\Bn ))  -  A  =  0  (6) 

^qn 

and 

dJn  _  i  _  n  n  \ 

a  —  /  J  Ekn  1  —  0  ( 1 ) 

An 

k= 1 

Solving  (6)  for  u  leads  to: 


r  a  i  1 

Uqn  _m  X  -log(Pr(tq\Bn) _ 

Substituting  (8)  back  into  (7),  we  obtain 

-log(Pr(tq  |Ew))1/(1-?") 

Uqn  T,Li-l°9(Pr{tk\Bn))W-m) 


(8) 

(9) 


To  optimize  Jn  with  respect  to  the  target  concepts  T,  we 
first  need  to  define  the  probability  of  a  bag  of  instances.  Recall 
that  a  bag  is  positive  if  and  only  if  at  least  one  of  its  instances 
is  positive  and  is  negative  if  and  only  if  all  of  its  instances 
are  negative.  In  this  paper,  we  use  the  NOISY-OR  model  [9], 
[21]: 


Pr(tk\Bn) 


l-Hi=i(1~Pr(bm  etk))  if  label(Bn)=l 
Yll^iif-Pripni&k))  if  label(Bn)=Q 

(10) 


where  label (Bn)= 1  for  positive  bags  (or  B+)  ,  and 
label(Bn)=  0  for  negative  bags  (or  B~).  In  (10),  Pr(bni  G  tk) 


can  be  regarded  as  the  similarity  of  instance  bni  to  target 
concept  tb-  Assuming  that  each  tk  is  characterized  by  a 
representative  feature  vector  (e.g.  centroid),  C&,  we  let 

Pr{bni  £  tk)  =  e-(*:i=iak*(bnt*-Ch*)a)  (ii) 


In  (11),  sk  is  a  scaling  parameter  that  weights  the  role  individ¬ 
ual  features  play  in  defining  the  overall  similarity  [9].  Using 
(11),  finding  the  optimal  target  concepts  reduces  to  finding 
their  optimal  centers  c k  and  scales  for  fc  =  1,  •  •  •  ,K.  Thus, 
we  need  to  solve 


dJ 

dck 


and 


dJ_ 

dsk 


Pr{tk\Bn)  dck 


V  dPr(tk\Bn)  _ 

“  Pr(tk\Bn)  dsk 


(12) 

(13) 


Since  the  definition  of  Pr{tk\Bn)  depends  on  whether  Bn  is 
positive  or  neagtive  bag,  we  rewrite  (12)  and  (13)  as 

dJ_  =  yX  i#n  x  dPr(tk\B+) 
dck  “  Pr (tk\B%)  dck 

_  sp  K, g  x  dPr(tk\B~) 

Pr{tk\Bn)  dck 


and 


dPrjBni&k) 


2 skf{Bnif-ckf)2e  Ski{Kii  Ckj)2 

(19) 


Equations  (12)  and  (13)  have  no  closed-form  solution. 
Instead,  we  use  approximate  solutions  based  on  an  iterative 
line  search  approach  as  in  [9].  The  resulting  FCMI  algorithm 
is  outlined  below. 


Algorithm  1  The  FCMI  Algorithm 

Inputs:  B+  and  B~\  the  sets  of  +  and  -  bags. 

K\  the  number  of  target  concepts. 

Outputs:  C:  Centers  of  the  K  target  concepts. 

S:  Scales  of  the  K  target  concepts. 

U:  Membership  of  all  bags  in  all  target  concepts. 

Initialize  ck  and  sk  for  k  =  1 ,  •  •  •  ,  K 

repeat 

Update  Ukn  using  (9). 

Update  C  and  S  by  performing  few  iterations  of  a  line 
search  algorithm  that  minimizes  (12)  and  (13). 
until  centers  do  not  change  significatively  or  number  of 
iterations  is  exceeded 

return  C,  S,  U 


IV.  Experimental  Results 
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8J  _  y, 

dsk  ^ 


*kn 


r!  Pr(tk\B+) 
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Jjkn 


Pr(tk\Bn) 

Using  (10),  it  can  be  shown  that 


dPr(tk\B+) 

dck 


and 

dPr{tk\B+) 

dsk 


El 


dPr(tk\B+) 

dsk 

dPrjtkjB^) 

dsk 


(15) 


dPr{bP  €  tk) 


Pr(b+  €  tk) 
f[(l  -  Pr(bP  £  tk)) 


I 


dck 


(16) 


i=l 


--  E 


dPr(bni  £  tk) 


P  1  -  Pr(bni  £  tk) 

I 

f[(l  -  Pr(6“i  £  tk)) 


dck 


(17) 


i=l 


The  proposed  FCMI  algorithm  was  applied  to  analyze 
data  of  buried  landmines  collected  using  a  Ground  Penetrating 
Radar  (GPR)  sensor.  The  data  was  collected  using  a  NIITEK 
vehicle-mounted  GPR  system  [22]  from  outdoor  test  lanes 
at  two  different  locations.  The  first  location  was  a  temperate 
region  with  significant  rainfall,  whereas  the  second  collection 
was  a  desert  region.  The  lanes  in  both  locations  are  simulated 
roads  with  known  mine  locations.  All  mines  are  Anti-Tank 
(AT)  mines  that  can  be  classified  into  2  categories:  anti-tank 
metal  (ATM)  and  anti-tank  with  low  metal  content  (ATFM). 
All  mines  are  buried  from  0”  to  8”  under  the  surface.  Multiple 
data  collections  were  performed  at  each  site  at  different  dates 
resulting  in  a  large  and  diverse  collection  of  mine  and  false 
alarm  signatures.  False  alarms  arise  as  a  result  of  radar  signals 
that  present  a  mine-like  character.  Such  signals  are  generally 
said  to  be  a  result  of  clutter.  Each  sample,  or  “alarm,”  in 
the  dataset  has  a  corresponding  datacube  with  dimensions 
representing  the  depth  (500  depth  bins),  down-track  (15  frames 
or  scans),  and  cross-track  (15  channels).  Using  the  ground 
truth,  each  sample  is  labeled  as  mine  or  clutter.  The  true  depth 
location  is  unknown.  For  our  experiment,  we  use  a  subset  of 
the  data  that  has  400  mine  samples  and  400  clutter  samples. 


Similar  equations  can  be  derived  for  dPr^Bn  -  by  substituting 
sk for  Ck  in  (16)  and  (17).  Using  (11),  the  partial  probabilities 
in  (16)  and  (17)  (and  the  equivalent  equations  for  the  scale) 
can  be  computed  using 


dPrjBgi&k) 

dckf 


2 (Bnif  -  ckf)s2kfe  cfcj)2 

(18) 


Each  alarm  is  divided  into  15  overlapping  windows  along 
the  depth.  From  each  window  (50  depths  x  15  scans  x  15 
channels)  we  extract  Edge  Histogram  Descriptors  (EHD)  [4]. 
We  extract  a  35-dim  EHD  feature  from  the  (depth, down- track) 
dimensions  at  the  central  channel  and  another  3  5 -dim  EHD 
feature  from  the  (depth,  cross-track)  dimensions  at  the  central 
scan.  The  2  EHDs  are  concatenated  to  form  a  70-dim  feature 
vector.  To  fit  this  data  into  the  MIL  framework,  each  alarm 
is  represented  by  a  bag  of  15  instances  where  each  instance 
is  represented  by  a  70-dim  feature  vector.  Each  bag  is  labeled 


as  positive  (mine)  or  negative  (clutter).  Labels  at  the  instance 
level  are  not  available.  We  only  know  that  a  positive  bag  has 
one  or  more  instances  that  exhibit  the  signature  of  a  mine. 

The  proposed  FCMI  algorithm  assumes  that  the  number 
of  target  concepts  is  given.  In  our  experiment,  we  assume  that 
K— 3.  We  initialize  the  centers  of  the  target  concepts  using  the 
following  heuristic.  First,  using  all  instances  from  all  positive 
bags,  we  select  several  candidate  centers  that  cover  most  of 
the  instance  feature  space.  These  candidates  correspond  to 
instances  that  are  as  distant  from  each  other  as  possible.  Next, 
for  each  candidate,  we  identify  its  50  nearest  neighbors  using 
all  instances  (positive  and  negative).  Out  of  all  candidates,  we 
select  the  3  instances  that  have  the  largest  ratio  of  instances 
from  positive  bags  to  instances  from  negative  bags.  The  scales 
of  each  target  concept  are  initialized  as  the  inverse  of  the 
standard  deviation  of  all  instances  identified  as  its  nearest 
neighbors. 

In  figure  4,  we  display  a  scatter  plot  of  the  features  of  all 
instances  of  all  bags.  Here,  for  the  purpose  of  visualization, 
we  project  the  70-dim  data  to  its  2  principal  components.  In 
this  figure,  we  also  display  the  initial  centers  of  the  3  target 
concepts  and  the  final  centers  after  convergence.  We  also  show 
the  path  of  each  center  as  the  FCMI  iterates.  For  this  data,  we 
fix  the  fuzzifier  m  to  1.5  and  for  each  iteration,  we  run  the  line 
search  (to  update  the  centers  and  scales)  for  5  iterations.  First, 
we  note  that  some  features  from  instances  of  positive  bags 
are  clustered  away  (top  left  of  figure)  from  negative  instances 
(on  the  right  side).  Typically,  these  correspond  to  instances 
extracted  from  the  “correct”  depth.  Other  instances,  on  the 
other  hand,  are  located  around  instances  from  negative  bags. 
These  correspond  to  instances  extracted  from  the  background 
part  of  the  positive  bags.  Second,  we  note  that  the  3  centers 
converged  to  dense  red  regions  (instances  from  positive  bags) 
and  away  from  dense  blue  regions  (instances  from  negative 
bags). 
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Fig.  4.  Scatter  plot  of  the  2  principal  components  of  the  instances  feature 
space.  Instances  of  positive  bags  are  displayes  as  red  ’x’  and  instances  of 
negative  bags  are  displayed  as  blue  ’o’.  The  location  of  initial  (final)  centers 
is  shown  by  circles  (squares). 

Recall  that  labels  at  the  instance  level  are  not  available 
and  that  positive  bags  include  at  least  one  positive  instance. 


After  running  FCMI,  we  use  the  following  simple  steps  to 
identify  positive  instances  within  positive  bags.  Assume  that 
bag  Bn  is  assigned  to  target  concept  tk  (he.  u^n  =  maxi^n 
for  i  =  1,  •  •  •  ,AT).  The  likelihood  of  each  instance,  bni,  of 
bag  Bn  in  target  concept  tk  can  be  computed  using  (11).  The 
most  likely  positive  instance  is  the  one  that  has  the  largest 
likelihood  (multiple  positive  instances  could  be  identified  using 
a  threshold).  To  verify  that  FCMI  was  able  to  identify  the 
relevant  instances  within  positive  bags,  in  figure  5,  we  display 
few  mine  alarms  where  we  highlight  the  window  of  the  most 
likely  instance.  As  it  can  be  seen,  this  window  corresponds  to 
the  strongest  part  of  the  mine  signature. 
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Fig.  5.  GPR  signatures  of  three  different  alarms.  Each  alarm  is  represented 
by  a  bag  of  15  instances  extracted  at  different  depths.  The  most  likely  instance 
of  each  bag  is  highlighted 

To  illustrate  the  need  to  identify  multiple  target  concepts, 
in  figure  6,  we  display  samples  from  the  3  target  concepts 
identified  by  FCMI.  For  each  target  concepts,  we  disply  3 
typical  instances.  As  it  can  be  seen,  target  concept  1  corre¬ 
sponds  to  mines  with  large  and  strong  energy.  Most  of  the 
mines  assigned  to  this  concept  are  large  and  buried  no  more 
than  3”  deep.  Target  concept  2  corresponds  to  large  mines  with 
weak  energy.  These  are  typically  large  mines  buried  deeper 
than  3”.  Target  concept  3  corresponds  to  mines  with  narrower 
signatures.  These  are  typically  mines  of  smaller  sizes. 

The  proposed  fuzzy  approach  has  several  advantages.  First, 
at  the  bag  level,  each  bag  Bn  belongs  to  each  target  concept 
tk  with  a  fuzzy  membership  degree  Ukn-  This  is  the  stan¬ 
dard  advantage  that  fuzzy  clustering  methods  have  over  crisp 
clustering.  Second,  and  more  importantly,  fuzzy  memberships 
can  provide  more  detailed  information  at  the  instance  level. 
Specifically,  a  bag  Bn  can  have  a  relatively  high  membership 
in  concept  fci,  Ukin>  because  one  of  its  instances  is  close 
to  concept  k\.  Similarly,  the  same  bag  can  have  a  non-zero 
membership  in  another  concept  k2,  ^/c2n>  either  because  the 
same  instance  is  close  to  concept  k 2  or  because  a  different 
instance  is  close  to  concept  k2.  Distinction  between  these  two 
cases  can  provide  a  richer  description  of  the  data.  In  figure  7, 
we  illustrate  these  two  scenarios.  In  figure  7(a),  the  bag  has 
0.68  membership  in  concept  2  (large  mines  with  weak  energy, 
refer  to  figure  6(b)  )  and  0.31  membership  in  concept  3  (small 
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(a)  Sample  instances  from  bags  assigned  to  target  concept  1 
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(b)  Sample  instances  from  bags  assigned  to  target  concept  2 


5 

5 

5 

lO 

lO 

P 

lO 

■ 

15 

20 

s 

15 

20 

15 

20 

i 

I.25 

i.25 

1_25 

i 

S 

30 

s 

30 

s 

S 

30 

■ 

35 

35 

35 

40 

40 

■ 

40 

50 

50 

50 

5  lO  15  5  lO  15  5  lO  15 

Scans  Scans  Scans 


(c)  Sample  instances  from  bags  assigned  to  target  concept  3 

Fig.  6.  Representative  instances  from  the  3  target  concepts  identified  by 
FCMI. 


mines  with  narrow  signature,  refer  to  figure  6(c) ).  In  this  case, 
the  same  instance  has  the  highest  likelihood  in  both  concepts. 
In  figure  7(b),  the  bag  has  0.57  membership  in  concept  1 
(large  mines  with  strong  energy,  refer  to  figure  6(a)  ),  0.15 
membership  in  concept  2,  and  0.28  membership  in  concept  3. 
In  this  case,  different  instances  have  the  highest  likelihood  in 
the  3  concepts.  For  instance,  the  shallower  instance  is  similar 
to  concept  1  while  the  deeper  instance  is  similar  to  concept  3. 

V.  Conclusions 

We  proposed  an  algorithm  to  identify  multiple  target  con¬ 
cepts  for  multiple  instance  data.  First,  we  defined  a  fuzzy 
Multi-target  concept  Diverse  Density  (MDD)  metric.  Then, 
we  derived  the  necessary  conditions  to  optimize  this  MDD 
and  developed  the  Fuzzy  Clustering  of  Multiple  Instance  data 
algorithm.  The  FCMI  algorithm  identifies  K  target  concepts 
simultaneously.  Each  target  concept  correspond  to  a  dense 
region  in  the  instances  feature  space  with  maximal  correlation 
to  instances  from  positive  samples  and  minimal  correlation  to 
instances  from  negative  samples. 

Using  data  of  buried  landmines  collected  with  a  ground 
penetrating  radar  sensor,  we  showed  that  the  proposed  FCMI 


Scans  Scans 

(a)  an  alarm  that  has  0.68  fuzzy  membership  in  concept  2 
(left)  and  0.31  membership  in  concept  3  (right).  The  instances 
with  the  highest  likelihood  are  extracted  from  the  same  depth 
location. 
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(b)  an  alarm  that  has  0.57  fuzzy  membership  in  concept  1,  0.15 
membership  in  concept  2,  and  0.28  membership  in  concept 
3.  The  instances  with  the  highest  likelihood  and  contributing 
to  the  above  memberships  are  extracted  from  different  depth 
locations. 

Fig.  7.  Sample  alarms  with  high  fuzzy  memberships  in  more  than  one 
concept. 


algorithm  can  identify  distinct  target  concepts  that  correspond 
to  mines  of  different  types  buried  at  different  depths.  Different 
instances  within  a  bag  can  be  similar  to  only  one  target  concept 
or  to  multiple  target  concepts.  Thus,  the  FCMI  can  be  used  to 
provide  rich  description  of  the  data  at  the  instance  level. 

In  this  paper,  we  provided  only  qualitative  evaluation  of 
the  proposed  FCMI.  Quantitative  evaluation  would  require 
building  an  additional  layer  that  performs  classification.  For 
instance,  the  identified  target  concepts  could  be  used  to  map 
the  multiple  instance  data  as  in  [12].  We  are  investigating 


this  research  direction.  Another  research  direction  that  we  are 
currently  investigating  is  the  development  of  a  possibilistic  [23] 
version  of  FCMI.  Since  multiple  instance  data  can  have  a  large 
number  of  negative  bags  and  even  positive  bags  can  have  a 
large  number  of  irrelevant  instances,  a  possibilistic  version  of 
FCMI  can  make  it  more  robust. 
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